{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>\n",
    "\n",
    "<i>Licensed under the MIT License.</i>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Neural Collaborative Filtering (NCF)\n",
    "\n",
    "This notebook serves as an introduction to Neural Collaborative Filtering (NCF), which is an innovative algorithm based on deep neural networks to tackle the key problem in recommendation — collaborative filtering — on the basis of implicit feedback."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0 Global Settings and Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "System version: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) \n",
      "[GCC 7.3.0]\n",
      "Pandas version: 0.24.2\n",
      "Tensorflow version: 1.12.0\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "sys.path.append(\"../../\")\n",
    "import os\n",
    "import shutil\n",
    "import papermill as pm\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "\n",
    "from reco_utils.common.timer import Timer\n",
    "from reco_utils.recommender.ncf.ncf_singlenode import NCF\n",
    "from reco_utils.recommender.ncf.dataset import Dataset as NCFDataset\n",
    "from reco_utils.dataset import movielens\n",
    "from reco_utils.dataset.python_splitters import python_chrono_split\n",
    "from reco_utils.evaluation.python_evaluation import (rmse, mae, rsquared, exp_var, map_at_k, ndcg_at_k, precision_at_k, \n",
    "                                                     recall_at_k, get_top_k_items)\n",
    "from reco_utils.common.constants import SEED as DEFAULT_SEED\n",
    "\n",
    "\n",
    "print(\"System version: {}\".format(sys.version))\n",
    "print(\"Pandas version: {}\".format(pd.__version__))\n",
    "print(\"Tensorflow version: {}\".format(tf.__version__))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "# top k items to recommend\n",
    "TOP_K = 10\n",
    "\n",
    "# Select MovieLens data size: 100k, 1m, 10m, or 20m\n",
    "MOVIELENS_DATA_SIZE = '100k'\n",
    "\n",
    "# Model parameters\n",
    "EPOCHS = 100\n",
    "BATCH_SIZE = 256\n",
    "\n",
    "SEED = DEFAULT_SEED  # Set None for non-deterministic results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1 Matrix factorization algorithm\n",
    "\n",
    "NCF is new neural matrix factorization model, which ensembles Generalized Matrix Factorization (GMF) and Multi-Layer Perceptron (MLP) to unify the strengths of linearity of MF and non-linearity of MLP for modelling the user–item latent structures. NCF can be demonstrated as a framework for GMF and MLP, which is illustrated as below:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://recodatasets.blob.core.windows.net/images/NCF.svg?sanitize=true\">\n",
    "\n",
    "This figure shows how to utilize latent vectors of items and users, and then how to fuse outputs from GMF Layer (left) and MLP Layer (right). We will introduce this framework and show how to learn the model parameters in following sections.\n",
    "\n",
    "### 1.1 The GMF model\n",
    "\n",
    "In ALS, the ratings are modeled as follows:\n",
    "\n",
    "$$\\hat { r } _ { u , i } = q _ { i } ^ { T } p _ { u }$$\n",
    "\n",
    "GMF introduces neural CF layer as the output layer of standard MF. In this way, MF can be easily generalized\n",
    "and extended. For example, if we allow the edge weights of this output layer to be learnt from data without the uniform constraint, it will result in a variant of MF that allows varying importance of latent dimensions. And if we use a non-linear function for activation, it will generalize MF to a non-linear setting which might be more expressive than the linear MF model. GMF can be shown as follows:\n",
    "\n",
    "$$\\hat { r } _ { u , i } = a _ { o u t } \\left( h ^ { T } \\left( q _ { i } \\odot p _ { u } \\right) \\right)$$\n",
    "\n",
    "where $\\odot$ is element-wise product of vectors. Additionally, ${a}_{out}$ and ${h}$ denote the activation function and edge weights of the output layer respectively. MF can be interpreted as a special case of GMF. Intuitively, if we use an identity function for aout and enforce h to be a uniform vector of 1, we can exactly recover the MF model.\n",
    "\n",
    "### 1.2 The MLP model\n",
    "\n",
    "NCF adopts two pathways to model users and items: 1) element-wise product of vectors, 2) concatenation of vectors. To learn interactions after concatenating of users and items latent features, the standard MLP model is applied. In this sense, we can endow the model a large level of flexibility and non-linearity to learn the interactions between $p_{u}$ and $q_{i}$. The details of MLP model are:\n",
    "\n",
    "For the input layer, there is concatention of user and item vectors:\n",
    "\n",
    "$$z _ { 1 } = \\phi _ { 1 } \\left( p _ { u } , q _ { i } \\right) = \\left[ \\begin{array} { c } { p _ { u } } \\\\ { q _ { i } } \\end{array} \\right]$$\n",
    "\n",
    "So for the hidden layers and output layer of MLP, the details are:\n",
    "\n",
    "$$\n",
    "\\phi _ { l } \\left( z _ { l } \\right) = a _ { o u t } \\left( W _ { l } ^ { T } z _ { l } + b _ { l } \\right) , ( l = 2,3 , \\ldots , L - 1 )\n",
    "$$\n",
    "\n",
    "and:\n",
    "\n",
    "$$\n",
    "\\hat { r } _ { u , i } = \\sigma \\left( h ^ { T } \\phi \\left( z _ { L - 1 } \\right) \\right)\n",
    "$$\n",
    "\n",
    "where ${ W }_{ l }$, ${ b }_{ l }$, and ${ a }_{ out }$ denote the weight matrix, bias vector, and activation function for the $l$-th layer’s perceptron, respectively. For activation functions of MLP layers, one can freely choose sigmoid, hyperbolic tangent (tanh), and Rectifier (ReLU), among others. Because of binary data task, the activation function of the output layer is defined as sigmoid $\\sigma(x)=\\frac{1}{1+e^{-x}}$ to restrict the predicted score to be in (0,1).\n",
    "\n",
    "\n",
    "### 1.3 Fusion of GMF and MLP\n",
    "\n",
    "To provide more flexibility to the fused model, we allow GMF and MLP to learn separate embeddings, and combine the two models by concatenating their last hidden layer. We get $\\phi^{GMF}$ from GMF:\n",
    "\n",
    "$$\\phi _ { u , i } ^ { G M F } = p _ { u } ^ { G M F } \\odot q _ { i } ^ { G M F }$$\n",
    "\n",
    "and obtain $\\phi^{MLP}$ from MLP:\n",
    "\n",
    "$$\\phi _ { u , i } ^ { M L P } = a _ { o u t } \\left( W _ { L } ^ { T } \\left( a _ { o u t } \\left( \\ldots a _ { o u t } \\left( W _ { 2 } ^ { T } \\left[ \\begin{array} { c } { p _ { u } ^ { M L P } } \\\\ { q _ { i } ^ { M L P } } \\end{array} \\right] + b _ { 2 } \\right) \\ldots \\right) \\right) + b _ { L }\\right.$$\n",
    "\n",
    "Lastly, we fuse output from GMF and MLP:\n",
    "\n",
    "$$\\hat { r } _ { u , i } = \\sigma \\left( h ^ { T } \\left[ \\begin{array} { l } { \\phi ^ { G M F } } \\\\ { \\phi ^ { M L P } } \\end{array} \\right] \\right)$$\n",
    "\n",
    "This model combines the linearity of MF and non-linearity of DNNs for modelling user–item latent structures.\n",
    "\n",
    "### 1.4 Objective Function\n",
    "\n",
    "We define the likelihood function as:\n",
    "\n",
    "$$P \\left( \\mathcal { R } , \\mathcal { R } ^ { - } | \\mathbf { P } , \\mathbf { Q } , \\Theta \\right) = \\prod _ { ( u , i ) \\in \\mathcal { R } } \\hat { r } _ { u , i } \\prod _ { ( u , j ) \\in \\mathcal { R } ^{ - } } \\left( 1 - \\hat { r } _ { u , j } \\right)$$\n",
    "\n",
    "Where $\\mathcal{R}$ denotes the set of observed interactions, and $\\mathcal{ R } ^ { - }$ denotes the set of negative instances. $\\mathbf{P}$ and $\\mathbf{Q}$ denotes the latent factor matrix for users and items, respectively; and $\\Theta$ denotes the model parameters. Taking the negative logarithm of the likelihood, we obatain the objective function to minimize for NCF method, which is known as [binary cross-entropy loss](https://en.wikipedia.org/wiki/Cross_entropy):\n",
    "\n",
    "$$L = - \\sum _ { ( u , i ) \\in \\mathcal { R } \\cup { \\mathcal { R } } ^ { - } } r _ { u , i } \\log \\hat { r } _ { u , i } + \\left( 1 - r _ { u , i } \\right) \\log \\left( 1 - \\hat { r } _ { u , i } \\right)$$\n",
    "\n",
    "The optimization can be done by performing Stochastic Gradient Descent (SGD), which is described in the [Surprise SVD deep dive notebook](../02_model/surprise_svd_deep_dive.ipynb). Our SGD method is very similar to the SVD algorithm's."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2 TensorFlow implementation of NCF\n",
    "\n",
    "We will use the MovieLens dataset, which is composed of integer ratings from 1 to 5.\n",
    "\n",
    "We convert MovieLens into implicit feedback, and evaluate under our *leave-one-out* evaluation protocol.\n",
    "\n",
    "You can check the details of implementation in `reco_utils/recommender/ncf`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3 TensorFlow NCF movie recommender\n",
    "\n",
    "### 3.1 Load and split data\n",
    "\n",
    "To evaluate the performance of item recommendation, we adopted the leave-one-out evaluation.\n",
    "\n",
    "For each user, we held out his/her latest interaction as the test set and utilized the remaining data for training. We use `python_chrono_split` to achieve this. And since it is too time-consuming to rank all items for every user during evaluation, we followed the common strategy that randomly samples 100 items that are not interacted by the user, ranking the test item among the 100 items. Our test samples will be constructed by `NCFDataset`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "4.93MB [00:00, 18.4MB/s]                          \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>userID</th>\n",
       "      <th>itemID</th>\n",
       "      <th>rating</th>\n",
       "      <th>timestamp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>196</td>\n",
       "      <td>242</td>\n",
       "      <td>3.0</td>\n",
       "      <td>881250949</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>186</td>\n",
       "      <td>302</td>\n",
       "      <td>3.0</td>\n",
       "      <td>891717742</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>22</td>\n",
       "      <td>377</td>\n",
       "      <td>1.0</td>\n",
       "      <td>878887116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>244</td>\n",
       "      <td>51</td>\n",
       "      <td>2.0</td>\n",
       "      <td>880606923</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>166</td>\n",
       "      <td>346</td>\n",
       "      <td>1.0</td>\n",
       "      <td>886397596</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   userID  itemID  rating  timestamp\n",
       "0     196     242     3.0  881250949\n",
       "1     186     302     3.0  891717742\n",
       "2      22     377     1.0  878887116\n",
       "3     244      51     2.0  880606923\n",
       "4     166     346     1.0  886397596"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = movielens.load_pandas_df(\n",
    "    size=MOVIELENS_DATA_SIZE,\n",
    "    header=[\"userID\", \"itemID\", \"rating\", \"timestamp\"]\n",
    ")\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "train, test = python_chrono_split(df, 0.75)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Functions of NCF Dataset \n",
    "\n",
    "Dataset Class for NCF, where important functions are:\n",
    "\n",
    "`negative_sampling()`, sample negative user & item pair for every positive instances, with parameter `n_neg`.\n",
    "\n",
    "`train_loader(batch_size, shuffle=True)`, generate training batch with `batch_size`, also we can set whether `shuffle` this training set.\n",
    "\n",
    "`test_loader()`, generate test batch by every positive test instance, (eg. \\[1, 2, 1\\] is a positive user & item pair in test set (\\[userID, itemID, rating\\] for this tuple). This function returns like \\[\\[1, 2, 1\\], \\[1, 3, 0\\], \\[1,6, 0\\], ...\\], ie. following our *leave-one-out* evaluation protocol."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = NCFDataset(train=train, test=test, seed=SEED)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 Train NCF based on TensorFlow\n",
    "The NCF has a lot of parameters. The most important ones are:\n",
    "\n",
    "`n_factors`, which controls the dimension of the latent space. Usually, the quality of the training set predictions grows with as n_factors gets higher.\n",
    "\n",
    "`layer_sizes`, sizes of input layer (and hidden layers) of MLP, input type is list.\n",
    "\n",
    "`n_epochs`, which defines the number of iteration of the SGD procedure.\n",
    "Note that both parameter also affect the training time.\n",
    "\n",
    "`model_type`, we can train single `\"MLP\"`, `\"GMF\"` or combined model `\"NCF\"` by changing the type of model.\n",
    "\n",
    "We will here set `n_factors` to `4`, `layer_sizes` to `[16,8,4]`,  `n_epochs` to `100`, `batch_size` to 256. To train the model, we simply need to call the `fit()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = NCF (\n",
    "    n_users=data.n_users, \n",
    "    n_items=data.n_items,\n",
    "    model_type=\"NeuMF\",\n",
    "    n_factors=4,\n",
    "    layer_sizes=[16,8,4],\n",
    "    n_epochs=EPOCHS,\n",
    "    batch_size=BATCH_SIZE,\n",
    "    learning_rate=1e-3,\n",
    "    verbose=10,\n",
    "    seed=SEED\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Took 662.0558905601501 seconds for training.\n"
     ]
    }
   ],
   "source": [
    "with Timer() as train_time:\n",
    "    model.fit(data)\n",
    "\n",
    "print(\"Took {} seconds for training.\".format(train_time.interval))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.4 Prediction and Evaluation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4.1 Prediction\n",
    "\n",
    "Now that our model is fitted, we can call `predict` to get some `predictions`. `predict` returns an internal object Prediction which can be easily converted back to a dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>userID</th>\n",
       "      <th>itemID</th>\n",
       "      <th>prediction</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>149.0</td>\n",
       "      <td>0.150561</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>88.0</td>\n",
       "      <td>0.307006</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>101.0</td>\n",
       "      <td>0.346554</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.0</td>\n",
       "      <td>110.0</td>\n",
       "      <td>0.185647</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.0</td>\n",
       "      <td>103.0</td>\n",
       "      <td>0.002877</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   userID  itemID  prediction\n",
       "0     1.0   149.0    0.150561\n",
       "1     1.0    88.0    0.307006\n",
       "2     1.0   101.0    0.346554\n",
       "3     1.0   110.0    0.185647\n",
       "4     1.0   103.0    0.002877"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predictions = [[row.userID, row.itemID, model.predict(row.userID, row.itemID)]\n",
    "               for (_, row) in test.iterrows()]\n",
    "\n",
    "\n",
    "predictions = pd.DataFrame(predictions, columns=['userID', 'itemID', 'prediction'])\n",
    "predictions.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4.2 Generic Evaluation\n",
    "We remove rated movies in the top k recommendations\n",
    "To compute ranking metrics, we need predictions on all user, item pairs. We remove though the items already watched by the user, since we choose not to recommend them again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Took 2.801203966140747 seconds for prediction.\n"
     ]
    }
   ],
   "source": [
    "with Timer() as test_time:\n",
    "\n",
    "    users, items, preds = [], [], []\n",
    "    item = list(train.itemID.unique())\n",
    "    for user in train.userID.unique():\n",
    "        user = [user] * len(item) \n",
    "        users.extend(user)\n",
    "        items.extend(item)\n",
    "        preds.extend(list(model.predict(user, item, is_list=True)))\n",
    "\n",
    "    all_predictions = pd.DataFrame(data={\"userID\": users, \"itemID\":items, \"prediction\":preds})\n",
    "\n",
    "    merged = pd.merge(train, all_predictions, on=[\"userID\", \"itemID\"], how=\"outer\")\n",
    "    all_predictions = merged[merged.rating.isnull()].drop('rating', axis=1)\n",
    "\n",
    "print(\"Took {} seconds for prediction.\".format(test_time.interval))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MAP:\t0.044980\n",
      "NDCG:\t0.189388\n",
      "Precision@K:\t0.170626\n",
      "Recall@K:\t0.094332\n"
     ]
    }
   ],
   "source": [
    "\n",
    "eval_map = map_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)\n",
    "eval_ndcg = ndcg_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)\n",
    "eval_precision = precision_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)\n",
    "eval_recall = recall_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)\n",
    "\n",
    "print(\"MAP:\\t%f\" % eval_map,\n",
    "      \"NDCG:\\t%f\" % eval_ndcg,\n",
    "      \"Precision@K:\\t%f\" % eval_precision,\n",
    "      \"Recall@K:\\t%f\" % eval_recall, sep='\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4.3 \"Leave-one-out\" Evaluation\n",
    "\n",
    "We implement the functions to repoduce the leave-one-out evaluation protocol mentioned in original NCF paper.\n",
    "\n",
    "For each item in test data, we randomly samples 100 items that are not interacted by the user, ranking the test item among the 101 items (1 positive item and 100 negative items). The performance of a ranked list is judged by **Hit Ratio (HR)** and **Normalized Discounted Cumulative Gain (NDCG)**. Finally, we average the values of those ranked lists to obtain the overall HR and NDCG on test data.\n",
    "\n",
    "We truncated the ranked list at 10 for both metrics. As such, the HR intuitively measures whether the test item is present on the top-10 list, and the NDCG accounts for the position of the hit by assigning higher scores to hits at top ranks.\n",
    "\n",
    "**Note 1:** In exact leave-one-out evaluation protocol, we select only one of the latest items interacted with a user as test data for each user. But in this notebook, to compare with other algorithms, we select latest 25% dataset as test data. So this is an artificial \"leave-one-out\" evaluation only showing how to use `test_loader` and how to calculate metrics like the original paper. You can reproduce the real leave-one-out evaluation by changing the way of splitting data.\n",
    "\n",
    "**Note 2:** Because of sampling 100 negative items for each positive test item, "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HR:\t0.489403\n",
      "NDCG:\t0.385408\n"
     ]
    }
   ],
   "source": [
    "k = TOP_K\n",
    "\n",
    "ndcgs = []\n",
    "hit_ratio = []\n",
    "\n",
    "for b in data.test_loader():\n",
    "    user_input, item_input, labels = b\n",
    "    output = model.predict(user_input, item_input, is_list=True)\n",
    "\n",
    "    output = np.squeeze(output)\n",
    "    rank = sum(output >= output[0])\n",
    "    if rank <= k:\n",
    "        ndcgs.append(1 / np.log(rank + 1))\n",
    "        hit_ratio.append(1)\n",
    "    else:\n",
    "        ndcgs.append(0)\n",
    "        hit_ratio.append(0)\n",
    "\n",
    "eval_ndcg = np.mean(ndcgs)\n",
    "eval_hr = np.mean(hit_ratio)\n",
    "\n",
    "print(\"HR:\\t%f\" % eval_hr)\n",
    "print(\"NDCG:\\t%f\" % eval_ndcg)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.5 Pre-training\n",
    "\n",
    "To get better performance of NeuMF, we can adopt pre-training strategy. We first train GMF and MLP with random initializations until convergence. Then use their model parameters as the initialization for the corresponding parts of NeuMF’s parameters.  Please pay attention to the output layer, where we concatenate weights of the two models with\n",
    "\n",
    "$$h ^ { N C F } \\leftarrow \\left[ \\begin{array} { c } { \\alpha h ^ { G M F } } \\\\ { ( 1 - \\alpha ) h ^ { M L P } } \\end{array} \\right]$$\n",
    "\n",
    "where $h^{GMF}$ and $h^{MLP}$ denote the $h$ vector of the pretrained GMF and MLP model, respectively; and $\\alpha$ is a\n",
    "hyper-parameter determining the trade-off between the two pre-trained models. We set $\\alpha$ = 0.5."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.5.1 Training GMF and MLP model\n",
    "`model.save`, we can set the `dir_name` to store the parameters of GMF and MLP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = NCF (\n",
    "    n_users=data.n_users, \n",
    "    n_items=data.n_items,\n",
    "    model_type=\"GMF\",\n",
    "    n_factors=4,\n",
    "    layer_sizes=[16,8,4],\n",
    "    n_epochs=EPOCHS,\n",
    "    batch_size=BATCH_SIZE,\n",
    "    learning_rate=1e-3,\n",
    "    verbose=10,\n",
    "    seed=SEED\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Took 514.4518468379974 seconds for training.\n"
     ]
    }
   ],
   "source": [
    "with Timer() as train_time:\n",
    "    model.fit(data)\n",
    "\n",
    "print(\"Took {} seconds for training.\".format(train_time.interval))\n",
    "\n",
    "model.save(dir_name=\".pretrain/GMF\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = NCF (\n",
    "    n_users=data.n_users, \n",
    "    n_items=data.n_items,\n",
    "    model_type=\"MLP\",\n",
    "    n_factors=4,\n",
    "    layer_sizes=[16,8,4],\n",
    "    n_epochs=EPOCHS,\n",
    "    batch_size=BATCH_SIZE,\n",
    "    learning_rate=1e-3,\n",
    "    verbose=10,\n",
    "    seed=SEED\n",
    ")\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Took 565.3551139831543 seconds for training.\n"
     ]
    }
   ],
   "source": [
    "with Timer() as train_time:\n",
    "    model.fit(data)\n",
    "\n",
    "print(\"Took {} seconds for training.\".format(train_time.interval))\n",
    "\n",
    "model.save(dir_name=\".pretrain/MLP\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.5.2 Load pre-trained GMF and MLP model for NeuMF\n",
    "`model.load`, we can set the `gmf_dir` and `mlp_dir` to store the parameters for NeuMF."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Restoring parameters from .pretrain/GMF/model.ckpt\n",
      "INFO:tensorflow:Restoring parameters from .pretrain/MLP/model.ckpt\n"
     ]
    }
   ],
   "source": [
    "model = NCF (\n",
    "    n_users=data.n_users, \n",
    "    n_items=data.n_items,\n",
    "    model_type=\"NeuMF\",\n",
    "    n_factors=4,\n",
    "    layer_sizes=[16,8,4],\n",
    "    n_epochs=EPOCHS,\n",
    "    batch_size=BATCH_SIZE,\n",
    "    learning_rate=1e-3,\n",
    "    verbose=10,\n",
    "    seed=SEED\n",
    ")\n",
    "\n",
    "model.load(gmf_dir=\".pretrain/GMF\", mlp_dir=\".pretrain/MLP\", alpha=0.5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Took 660.8611929416656 seconds for training.\n"
     ]
    }
   ],
   "source": [
    "with Timer() as train_time:\n",
    "    model.fit(data)\n",
    "\n",
    "print(\"Took {} seconds for training.\".format(train_time.interval))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.5.3 Compare with not pre-trained NeuMF\n",
    "\n",
    "You can use beforementioned evaluation methods to evaluate the pre-trained `NCF` Model. Usually, we will find the performance of pre-trained NCF is better than the not pre-trained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Took 2.8479833602905273 seconds for prediction.\n"
     ]
    }
   ],
   "source": [
    "with Timer() as test_time:\n",
    "\n",
    "    users, items, preds = [], [], []\n",
    "    item = list(train.itemID.unique())\n",
    "    for user in train.userID.unique():\n",
    "        user = [user] * len(item) \n",
    "        users.extend(user)\n",
    "        items.extend(item)\n",
    "        preds.extend(list(model.predict(user, item, is_list=True)))\n",
    "\n",
    "    all_predictions = pd.DataFrame(data={\"userID\": users, \"itemID\":items, \"prediction\":preds})\n",
    "\n",
    "    merged = pd.merge(train, all_predictions, on=[\"userID\", \"itemID\"], how=\"outer\")\n",
    "    all_predictions = merged[merged.rating.isnull()].drop('rating', axis=1)\n",
    "\n",
    "print(\"Took {} seconds for prediction.\".format(test_time.interval))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MAP:\t0.041600\n",
      "NDCG:\t0.174057\n",
      "Precision@K:\t0.160657\n",
      "Recall@K:\t0.091654\n"
     ]
    }
   ],
   "source": [
    "eval_map2 = map_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)\n",
    "eval_ndcg2 = ndcg_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)\n",
    "eval_precision2 = precision_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)\n",
    "eval_recall2 = recall_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)\n",
    "\n",
    "print(\"MAP:\\t%f\" % eval_map2,\n",
    "      \"NDCG:\\t%f\" % eval_ndcg2,\n",
    "      \"Precision@K:\\t%f\" % eval_precision2,\n",
    "      \"Recall@K:\\t%f\" % eval_recall2, sep='\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Record results with papermill for tests\n",
    "pm.record(\"map\", eval_map)\n",
    "pm.record(\"ndcg\", eval_ndcg)\n",
    "pm.record(\"precision\", eval_precision)\n",
    "pm.record(\"recall\", eval_recall)\n",
    "pm.record(\"map2\", eval_map2)\n",
    "pm.record(\"ndcg2\", eval_ndcg2)\n",
    "pm.record(\"precision2\", eval_precision2)\n",
    "pm.record(\"recall2\", eval_recall2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.5.4 Delete pre-trained directory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Did '.pretrain' exist?: False\n"
     ]
    }
   ],
   "source": [
    "save_dir = \".pretrain\"\n",
    "if os.path.exists(save_dir):\n",
    "    shutil.rmtree(save_dir)\n",
    "    \n",
    "print(\"Did \\'%s\\' exist?: %s\" % (save_dir, os.path.exists(save_dir)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reference: \n",
    "1. Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu & Tat-Seng Chua, Neural Collaborative Filtering, 2017, https://arxiv.org/abs/1708.05031\n",
    "\n",
    "2. Official NCF implementation [Keras with Theano]: https://github.com/hexiangnan/neural_collaborative_filtering\n",
    "\n",
    "3. Other nice NCF implementation [Pytorch]: https://github.com/LaceyChen17/neural-collaborative-filtering"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "reco_gpu",
   "language": "python",
   "name": "reco_gpu"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}